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MORE ACCURATE USE OF COMPOSITION SCALES 



EDWARD WILLIAM DOLCH, Jr. 
University of Illinois 



Composition scales are said to measure "general excellence" 
in writing, the different samples, or steps, being supposed to repre- 
sent different quantities of this excellence. Use of these scales is 
admittedly more inaccurate than use of other educational measuring 
instruments, but it is universally granted that no one would expect 
judges to differentiate as accurately between quantities of "general 
excellence" in writing as between amounts of ability in arithmetic, 
history, or other school subjects. This point of view, though cor- 
rect, ignores altogether two other reasons for inaccuracy in judging 
composition that are far more important. We will give here an 
explanation of these other factors together with an account of how, 
in a particular experiment, they were quite satisfactorily overcome. 

It is a little recognized fact that very few persons, even experi- 
enced English teachers, see the same qualities in any sample on a 
composition scale or in any theme that they are trying to score with 
the scale. This might be expected of persons not trained to per- 
ceive qualities of composition, but it is true even of the English 
teachers themselves, for they have been trained, or have trained 
themselves, to note especially certain qualities rather than others. 
Some almost instinctively note all errors in punctuation; others 
are most likely to see nothing but accuracy and elegance in choice of 
words. Some pounce at once upon the worth and soundness of the 
thought expressed; others see only the form of expression. It is 
not that they are unable to make a complete analysis of any piece of 
writing. It is merely that by temperament, or training, or teaching 
practice they have become accustomed to singling out certain things 
for attention rather than others. In correcting their own papers 
they do just this thing day after day, and therefore they do it in 
using a scale also. 
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This second cause of inaccuracy has a much more disturbing 
effect than mere inability to definitely determine amounts of "gen- 
eral excellence." Hand copies of any composition to a number of 
English teachers to place on a scale. One will start placing the 
composition according to its style, another according to its spelling, 
another according to its unity, another according to its choice of 
words, and so on. Now if all were placing it according to a single 
quality, such as unity of thought, for instance, there would of course 
exist the difficulty of accurate discrimination of amount of this one 
quality. But when all are placing it on different qualities, or even 
different combinations of qualities, the chances of agreement become 
slim indeed. 

In contradiction to the above, makers of certain composition 
scales insist that the various qualities of writing correlate with one 
another very well, so that even if different users of a scale are 
emphasizing different things, they will locate any composition at 
about the same place. We will have some evidence upon this point 
a little later, but it should be noted that the authors of three dif- 
ferent scales recognize the tendency to see certain qualities and 
ignore others by making a special attempt to meet this difficulty. 
The Harvard-Newton Scale lists after each sample the especially 
good or bad qualities that are claimed to be present in it and that 
presumably the user of the scale should look for in the composition 
he is placing. The Willing Scale insists that the user note separately 
"form value" and "story value" and where the two do not correlate 
directs him to place the composition for each and average the two 
to get the score of the whole. The Minnesota Composition Scale 
differentiates between structure, mechanics, and thought content, 
and gives a number of points under each of the three that should be 
considered by the scorer. Certainly the authors of these scales 
admit the likelihood that even competent persons would not notice 
the same qualities either in the samples of the scale or in the composi- 
tion being scored. 

It will be of interest for our purpose to remark that the notes 
after the samples on the Harvard-Newton Scale are themselves a 
very excellent example of this failure to continually keep in mind the 
same qualities of writing. On the chart below we have listed all 
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the qualities mentioned in the notes on the exposition scale. Con- 
sider first the enormous number of these and that others not listed 
will at once occur to you. Then notice that only one item, "Order 

TABLE I 

Analysis of Notes at End of Samples on Harvard-Newton 
Exposition Scale 



Qualities Mentioned (in Language Used by the Notes) 



Mentioned in Notes on the Following Samples 



Truth of the facts (knowledge of subject) . 



Difficulty of subject. . . 
Keeping to the subject . 



Order of material 

Selection of detail 

Introduction and conclusion as evidence of plan. 

Sense of form 

Paragraphing 

Care in revision 



Maturity of expression 

Grasp of elem. principles of comp. 

Clearness 

Coherence 



Excellence of style . 
Ease of expression. 



General sentence structure . . . 
Variety of sentence structure . 

Smoothness 

Transitions and connectives . . 



Avoidance of repetition 

Choice of words 

Accuracy in meaning of words. 



Punctuation 

Spelling 

General mechanical correctness. 
Grammar — use of pronouns 
Grammar — shift of mood 



Total for each sample . 



A 
A 



A 
A 



A 

A 



A 

A 
A 

A 
A 



B 

B 
B 

B 
B 

B 



B 
B 



C 
C 



C 

C 

C 
C 
C 

c 



13 



D 



D 
D 

D 

D 



D 
D 
D 

D 



D 
D 



D 



D 



13 



E 

E 



F 
F 



F 
F 



Number of qualities mentioned is 27. There is obviously repetition and over- 
lapping in the list, but the words of the scale are used just as given. 

of Material," is mentioned with all six samples of the scale. One 
other, "spelling," is mentioned with five samples. Four others are 
mentioned with four samples, and all the others with a lesser num- 
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ber. Eight qualities are mentioned with but a single sample, being 
ignored in each case in the five others. 

A further and still more important reason for inaccuracy in use 
of composition scales is the enormous difference in stress placed by 
different persons upon different excellencies or faults in writing. 
Spelling is a good example. A person who is himself a good speller 
will place a theme full of misspelled words far down on any scale. 
A person, even an English teacher, who is a poor speller himself, will 
penalize very little for errors in spelling made in impromptu work, 
rating the theme on its other qualities almost altogether. It is the 
same with other qualities. In some teachers ' eyes, a single comma 
fault will damn a piece of writing altogether; others will call the 
error a slip and think little of it. The use of slang is to some persons 
a capital crime; others, who may use a great deal of slang them- 
selves, smile and make but slight deduction for it. Perhaps the 
worst instance of all is the presence or lack of imagination. Some 
persons feel that the slightest touch of imaginative power should 
raise any theme to first place regardless of mechanical errors; others 
believe so firmly in accuracy first that they grimly mark down for 
carelessness in details, no matter what signs of budding genius may 
appear. 

This inaccuracy or error introduced by different valuation of 
qualities or defects in writing may often be absent altogether, but 
when it does enter into the scoring of a theme, it is bound to make 
enormous discrepancies in the placing given on any composition 
scale. 

The three sources of inaccuracy were met in a recent study of 
university Freshman English themes and an account of how they 
were handled may be of value to others doing the same kind of work. 
The scale used in this instance was the Thorndike Extension of the 
Hillegas Scale. This one was chosen because on the upper half of 
it are a number of samples which in subject-matter are quite like 
the themes written by the Freshmen for this study. 

Beginning upon a practice series of these themes, the judges set 
out to score them, working together. Theme No. 1 was taken up 
and each was asked to make up his mind what score on the scale 
before him he would give it. The scores thus given were then called 
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for. To our surprise, they ranged over two steps of the scale. We 
were much astonished, for we felt we were all competent judges, 
and should agree. After a little argument we made a compromise 
and went on. The same thing happened with No. 2 and No. 3 and 
No. 4. The group became irritated. Why should not competent 
teachers, with presumably equal abilities and the same ideals, be 
able to do such a simple thing as place a theme on a composition 
scale on the first attempt, without having to explain to each other 
why it should be here and not there ? For the ones who graded low 
would at once raise the score when their attention was called to 
certain excellencies, and the high graders would at once come down 
when they were made to notice the defects. It was not that we 
really did not agree. It was that we did not see the same things 
when they were right before us. 

A way out of the difficulty was suggested by a little detail which 
came to our attention in the work. We found ourselves often 
mentally giving the theme under consideration a score after we had 
read only the first few sentences. Certainly such a practice was 
wrong if the whole theme was being considered. We also found 
ourselves scoring the theme just as soon as we ran across certain 
gross errors. They seemed to blot all further considerations out of 
our minds. These and other indications soon showed that we were 
placing the themes on the scale according to their most prominent fea- 
tures; and often what would be prominent to one judge would not 
strike the others at all. This applied both to the samples of the scale 
and the themes being judged. That is, there were as many scales 
as there were judges, and generally just as many themes, even 
though we were looking at identical copies. 

Setting to work to correct the difficulty, we evolved Table II. 
We found in the work that at least three things struck our attention 
forcibly, though seldom were all three prominent in any single 
theme. We noted the special technique of the kind of writing we 
judged (which in this case was systematic, straightforward exposi- 
tion). We noted the maturity and smoothness of the sentences 
used. And we noted errors in grammar, spelling, punctuation, 
sentence structure, etc. Other special qualities appeared in occa- 
sional themes, but not frequently enough to require recognition. 
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Studying the samples on the scale, we came to an agreement as to 
just how much of the three qualities was characteristic of each step. 
As we were using only those from 50 up, we worked up and down 

TABLE II 
Specification Sheet 
(For the steps on the Hillegas Scale, a device to make it objective and to show 
its relation to the kind of exposition done by university Freshmen.) 

60 

Systematic Thinking. — Either no apparent plan, or marked violations of 
unity and coherence in the whole. 

Maturity of Sentence Structure. — "Childish" sentences — short, monoto- 
nous, with practically no inverted order or parentheses. Or long sentences 
just "stuck together." 

Errors. — Numerous errors of all kinds short of sheer illiteracy. An occa- 
sional comma splice or fragment used as a sentence. 

70 

Systematic Thinking. — A moment's study reveals a definite plan or succes- 
sion of steps in thought, but the structure is not plain enough to be compre- 
hended at first reading. Or a bare outline. 

Maturity of Sentence Structure. — Still some monotony of sentence structure, 
but varied by occasional long sentences or inverted order. Or quite long sen- 
tences unsuccessfully handled. 

Errors. — -Some awkward sentences and some misspelling, but no illiterate 
blunders. 

80 

Systematic Thinking. — A well-defined plan that is perceived on first reading 
and that is smoothly indicated by words of back-and-forward reference. 

Maturity of Sentence Structure. — -No monotony felt. Variety by change 
in length, by inverted order, and by parentheses. Long sentences well 
handled. 

Errors. — None except occasional misspelling of more unusual words. 
Occasional difficulties due to length of sentences. 

90 

The qualities of 80 in higher degree plus elegancies of diction and style. 

through the samples for each quality separately and phrased as 
definite a description as we could of the distinction between each 
step and the one above and the one below it in terms of the single 
quality in mind. It should be noted that this description cannot 
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take the place of the actual concrete samples on the scale; it proved, 
however, an invaluable addition to them, as we were constantly 
checking our impression of each sample by the description and in 
turn making the description have real meaning by restudying the 
samples. 

By use of the "specification sheet," we saw to it that the error 
of scoring on different points by different judges was not committed, 
as each judge had to score on all three, no matter with which one he 
began. This specification sheet was made out, it must be under- 
stood, for our particular set of themes only. We were dealing only 
with straightforward, matter-of-fact exposition of ideas, nearly 
always by the process of enumeration. The writers were college 
Freshmen. For them, therefore, this specification or list of items 
appearing in the scale fitted fairly well. For other work written by 
other pupils, no doubt a somewhat different set of qualities would 
have to be pointed out and described. The main idea is that for 
the judges to do anything like consistent or satisfactory work, they 
must have a common understanding of what qualities they are scor- 
ing on and how the steps of the scale differ in those particular 
qualities. 

But the problem of evaluating the different details still remained 
unsettled. Each theme had three scores. How was the final score 
to be arrived at? Left to themselves the judges would certainly 
have evaluated the different qualities differently and each one's 
method would have varied from one theme to the next. In fact, 
without the specification sheet, the final score would have been, in 
very many cases, just one of the three, the others being forgotten. 
The method of evaluation used was a compromise, without any 
claim to settling the question. We secured the total score simply 
by averaging the three already secured. This made thinking, style 
and correctness count equally. How they actually should count 
would have to be determined by scientific scale makers; our method 
was a rough and ready one adopted merely for this occasion. 

Just to show how far the three qualities considered — systematic 
thinking, maturity of sentence structure, and freedom from errors — 
actually failed to correlate, the following table has been prepared. 
The three scores received by each of the ninety-five themes (each 
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score the average of four separate scorings) have been compared. 
Note three things shown by the table: first, that on relatively few 
themes are scores on any two qualities the same; second, that the 

TABLE III* 

Showing Lack of Correlation between the Scores Given Ninety- 
five Themes on Each of Three Separate Qualities 



Points on Thorndike Ext. of 
HiUegas 



+14- 
+13- 

+12. 

+n. 
+ io. 

+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 



- i. 

- 2. 

- 3- 

- 4- 

- 5- 

- 6. 

- 7- 

- 8. 

- 9- 

- io. 

-13- 
-14. 



Same score . . 
Higher score. 
Lower score . 



Total number of themes 



Score on S.S. 

above or below 

that on T. 



II 

9 

10 
6 
3 



12 
61 

22 



95 



Score on E. 

above or below 

that on T. 



4 
1 
8 
5 
5 
5 
6 

14 

11 

2 

4 

3 



7 
68 
20 



95 



Score on E. 

above or below 

that on S.S. 



3 

2 

3 

I 

3 
12 
10 

5 
4 
7 

14 

7 
1 
6 
5 
5 
2 
2 



14 
5° 
31 



95 



*T. = systematic thinking. S.S. = maturity of sentenre structure. E.=freedom 
from errors. These columns show the number of themes with a plus or minus varia- 
tion of the number of points shown in the first column. 



variation is sometimes one way, sometimes the other; third, that 
the variation is in many cases quite considerable. Emphasis on 
different qualities by different users of the scale would therefore 
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have resulted in marked disagreement between their scores and 
consequently in decided inaccuracy in results. 

Briefly, then, we have tried to point out three distinct sources of 
inaccuracy in use of English Composition scales, viz. : (i) difficulty 
of discriminating between quantities of qualities in writing; (2) 
failure of even competent persons to see the same qualities in any 
piece of writing; (3) non-agreement as to relative importance of 
various qualities in writing. We have tried to show that Item No. 2 
may be met by drawing up a list of qualities, all of which must be 
separately considered and no one of which must be omitted. We 
have pointed out that Item No. 3 may be handled by some arbitrary 
system of balancing against one another the various qualities listed. 
This leaves Item No. 1, which alone will cause much less error than 
is generally attributed to it, and which can be met only by training 
in use of the scale. 

Adoption of some such analysis as this will give us new results 
in use of old scales or will enable us to secure still better and more 
scientific scales for composition. 



